22 research outputs found
On Use of Task Independent Training Data in Tandem Feature Extraction
The problem we address in this paper is, whether the feature extraction module trained on large amounts of task independent data, can improve the performance of stochastic models? We show that when there is only a small amount of task specific training data available, tandem features trained on task independent data give considerable improvement over Perceptual Linear Prediction (PLP) cepstral features in Hidden Markov Model (HMM) based speech recognition systems
Using RASTA in task independent TANDEM feature extraction
In this work, we investigate the use of RASTA filter in the TANDEM feature extraction method when trained with a task independent data. RASTA filter removes the linear distortion introduced by the communication channel which is demonstrated in a 18\% relative improvement on the Numbers 95 task. Also, studies yielded a relative improvement of 35\% over the basic PLP features by combining TANDEM features and conventional PLP features
Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
This paper provides an overall introduction of our Automatic Speech
Recognition (ASR) systems for Southeast Asian languages. As not much existing
work has been carried out on such regional languages, a few difficulties should
be addressed before building the systems: limitation on speech and text
resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia
and Thai as examples to illustrate the strategies of collecting various
resources required for building ASR systems.Comment: Published by the 2017 IEEE International Conference on Orange
Technologies (ICOT 2017
Multi-resolution Spectral Entropy Based Feature for Robust ASR
Recently, entropy measures at different stages of recognition have been used in automatic speech recognition (ASR) task. In a recent paper, we proposed that formant positions of a spectrum can be captured by multi-resolution spectral entropy feature. In this paper, we suggest modifications to the spectral entropy feature extraction approach and compute entropy contribution from each sub-band to the total entropy of the normalized spectrum. Further, we explore the ideas of overlapping sub-bands and the time derivatives of the spectral entropy feature. The modified feature is robust to additive wide-band noise and performs well at low SNRs. In the last, in the frame work of TANDEM, we show that the system using combined entropy and PLP features works better than the baseline PLP feature for additive wide-band noise at different SNRs
Entropy Based Combination of Tandem Representations for Noise Robust ASR
In this paper, we present an entropy based method to combine tandem representations of the recently proposed Phase AutoCorrelation (PAC) based features and Mel-Frequency Cepstral Coefficients (MFCC) features. PAC based features, derived from a nonlinear transformation of autocorrelation coefficients and shown to be noise robust, improve their robustness to additive noise in their tandem representation. On the other hand, MFCC features in their tandem representation show a significant improvement in recognition performance on clean speech. An entropy based combination method investigated in this paper adaptively gives a higher weighting to the representation of MFCC features in clean speech and to the representation of PAC based features in noisy speech, thus yielding a robust recognition performance in all conditions
Recommended from our members
Pushing the Envelope—Aside
Despite successes, there are still significant limitations to speech recognition performance, particularly for conversational speech and/or for speech with significant acoustic degradations from noise or reverberation. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article. Note in passing that we and many others have already taken advantage of processing techniques that incorporate information over long time ranges, for instance for normalization (by cepstral mean subtraction as stated in B. Atal (1974) or relative spectral analysis (RASTA) based in H. Hermansky and N. Morgan (1994)). They also have proposed features that are based on speech sound class posterior probabilities, which have good properties for both classification and stream combination